2. CodeAct Makes LLMs Better Agents

2.1. What is CodeAct?

Figure 2: multi turn

https://github.com/xingyaoww/code-act/blob/main/figures/overview.png?raw=true

エージェントがsympyのコードを書いている

E. Example Prompt for CodeAct

Towards Unified Alignment Between Agents, Humans, and Environment

2.2. CodeAct Shows the Promise as a Strong Tool Use Framework

テキスト、JSON、CodeActどのフォーマットが正解のatomicなツール呼び出しをもたらすか実験

Table A.6にフォーマットの例

筆者らの仮説として、訓練で大量のコードを見ているのでCodeActはLLMにとって自然なのでは

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

結果はTable 2

For most LLMs, CodeAct achieves comparable or better performance even in atomic actions (the simplistic tool use scenario)

Table 2を見ると最善か次善

OpenなLLMにもClosedなLLMにもworkという主張（Best-performingの合計）

Open-source LLMでCodeActによる性能向上（JSONは最下位）

Closed-source LLMはJSONが効いている

2.3. CodeAct Gets More Done with Fewer Interactions

この論文で用意した M3ToolEval (Table A.7)

multiple calls to multiple tools in multi-turn interactions

F. M3ToolEval Prompt

実装？ https://github.com/xingyaoww/code-act/blob/d607f56c9cfe9e8632ebaf65dcaf2b4b7fe1c6f8/scripts/eval/m3tooleval/main.py

Figure 1: Comparison between CodeAct and Text / JSON as action

https://github.com/xingyaoww/code-act/blob/main/figures/json-text-comparison.png?raw=true

Instruction

Determine the most cost-effective country to purchase the smartphone model "CodeAct 1". The countries to consider are the USA, Japan, Germany, and India.

Available APIs 5つ

lookup_rates(country: str) -> (float, float)

convert_and_tax(price: float, exchange_rate: float, tax_rate: float) -> float

estimate_final_price(converted_price: float, shipping_cost: float) -> float

lookup_phone_price(model: str, country: str) -> float

estimate_shipping_cost(destination_country: str) -> float

エージェントがアクションとしてコードを書く

for文で反復

Python組み込みのmin()